‘Hashing does not anonymize personal data.’ Is the FTC right?
Posted: August 8, 2024
In a 24 July blog post, the US Federal Trade Commission (FTC) made an unequivocal statement about privacy that has some companies worried.
“No, hashing still doesn’t make your data anonymous,” reads the post’s title.
The post continues: “Companies often claim and act as if data that lacks clearly identifying information is anonymous, but data is only anonymous when it can never be associated back to a person.”
Is the FTC right? And why does it matter?
What’s hashing anyway?
Hashing is a way to change data into a seemingly random set of characters known as a “hash”.
Let’s say you want to hash someone’s Social Security Number (SSN), a unique identifier assigned to people in the US.
You can use commercially available software to create the hash using a method such as Secure Hash Algorithm 1 (SHA-1).
Will the hash of an SSN look like the SSN?
An SSN’s hash will look nothing like the original SSN.
An SSN is effectively a random nine-digit number. Here’s a random nine-digit number, which we’ll pretend is “Alice’s SSN”: 938485739
Here’s the corresponding hash, encoded via SHA-1, which we’ll call “Alice’s hash”: 573b17b519961dbfb0a08d304e508ec8affb82a0
If I have the hash, can I reverse engineer to discover the SSN?
First, it’s worth mentioning that some hashing methods can be “cracked” by a determined person with a lot of resources. That’s an information security problem and in this discussion about privacy, we can put it aside for the sake of argument.
In principle, you cannot reverse-engineer Alice’s hash and get Alice’s SSN.
Hashing is a one-way street and (assuming the method is sufficiently strong), you cannot return back down it.
So doesn’t that mean the hash is anonymous?
No, and here’s why.
Using the SHA-1 encoding, there’s only one hash of Alice’s SSN: “Alice’s hash” (see above).
If you have Alice’s hash, it’s very easy to discover Alice’s SSN. Simply generate every possible nine-digit number and hash them all to create a so-called “rainbow table”.
In this rainbow table, every possible nine-digit number is in the left column, and each number’s corresponding hash is in the right column.
Each column has around a billion entries. That’s not a lot of numbers for a modern computer to generate.
Some of the numbers in the left column will be real SSNs, and one of those SSNs will be Alice’s SSN.
Take Alice’s hash and find it in the right column of your rainbow table. Look to the left, and you’ll see Alice’s SSN.
What about a more complicated piece of data, like an email address?
While generating every nine-digit number is easy, generating every possible email address is not.
There are many more than one billion possible email addresses, so a “rainbow table” attack like this becomes much less feasible.
However, that doesn’t mean that hashed email addresses are anonymous.
Why aren’t hashed email addresses anonymous?
To explain why hashed email addresses are also not (necessarily) anonymous, we can turn to a real case of FTC enforcement action, taken against a therapy provider called BetterHelp.
BetterHelp wanted to advertise to its users on Facebook, so it had to tell Facebook who its users were.
Rather than giving Facebook its users’ names, BetterHelp sent Facebook a hash of each of its users’ email addresses.
Facebook has hashes of all its users’ email addresses, so it could look up the hashes received from BetterHelp and learn which BetterHelp users were also Facebook users.
As a result, Facebook could serve ads to those users on BetterHelp’s behalf.
Why did the FTC take action against BetterHelp
BetterHelp told users that because this data was hashed, it was anonymous.
The FTC pointed out that the hashed data was not anonymous because Facebook used it to identify BetterHelp users. As such, BetterHelp has misled its users.
BetterHelp settled for $7.8 million and is still dealing with the fallout from this case.
Why did BetterHelp bother hashing the email addresses?
The hashed email addresses were protected – to some extent – during transmission to Facebook.
If an attacker intercepted the data, they would only have a set of hashes.
As we’ve explained, the attacker could not simply have reverse-engineered the hashes to identify BetterHelp’s users. And creating a “rainbow table” full of every possible email address would be infeasible.
The attacker might have a huge database of hashed email addresses – so even during transit, not all of the email addresses were protected.
But as a security measure, hashing provides protection for at least some of BetterHelp’s users.
As a privacy measure, however, hashing was effectively pointless once Facebook received the data. BetterHelp might as well have sent the email addresses in plain text. Facebook could identify the users either way.
So the FTC is right? Hashing does not anonymize data?
For all practical purposes, the FTC is right. Hashing, in itself, does not anonymize data.
Now, to be fair, there are circumstances in which hashing can anonymize data.
But if you’re using hashes to track people, single people out, or identify people, you’re not dealing with anonymous data.
Of course, tracking people, singling people out, and identifying people isn’t illegal in itself.
But privacy, data protection, and consumer protection laws apply to these types of activity – whether using hashes, or any other type of data. And you must always be aware of your legal obligations.
At a minimum, you must be honest and transparent with your customers – and not pretend that hashing makes their data anonymous.
Read our research report: Privacy beyond borders
Our latest research:
- Explores consumer preferences across the US, UK, EU, and Canada in digital experiences
- Examines how privacy laws impact global user interactions
- Assesses consumer awareness of regional privacy regulations
- Investigates variations in privacy concerns across different regions